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LONG-TERM  GOALS 

This  one-year  effort  will  focus  on  the  transition  of  FERE  s  machine  learning  algorithms  for 
HyperSpectral  Imagery  (HSI)  in  the  VSW  into  a  distributable  code  set.  This  will  provide  a  stable  code 
platform  for  the  application  and  transition  of  machine  learning-based  hyperspectral  classification 
techniques  into  6. 3/6.4  programs.  (This  work  was  funded  mid-year  2008.) 

OBJECTIVES 

Our  objective  is  to  focus  on  three  areas  of  application  research  and  transitions.  First,  we  will  transition 
our  machine  learning-based  algorithms  and  computer  code  for  the  determination  of  bathymetry, 
bottom  type,  and  water  column  Inherent  Optical  Properties  from  HyperSpectral  Imagery  (HSI)  into  a 
deliverable  Message  Passing  Interface  (MPI)  program  that  may  be  easily  used  by  other  research  and 
military  operators.  Second,  we  will  use  this  program  to  determine  the  impacts  of  the  granularity  of  the 
classification  database  on  the  inversion  bathymetry,  bottom  type,  and  lOPs.  Third,  we  will  move 
beyond  the  use  of  single  pixel  HSI  inversion  to  the  use  of  spatial  context-filtering  to  remove  pixel-to- 
pixel  noise  inherent  in  the  HSI  data. 

APPROACH 

Task  I 

In  previous  works,  a  Fook-Up  Table  (FUT)  algorithm  was  used  in  accurately  predicting  bathymetry 
(Mobley  et  al.  2002,  Bissett  et  al.  2004,  Bissett  et  al.  2005,  Mobley  et  al.  2005,  Lesser  and  Mobley, 
2008).  The  FUT  approach  is  a  subset  of  a  larger  body  of  artificial  intelligence  work  concerned  with 
algorithms  and  techniques  that  “teach”  machine  to  learn  from  the  examination  of  data  and  rules.  This 
body  of  work  is  aptly  called  “machine  learning”  and  some  of  its  techniques  include  decision  trees, 
genetic  algorithms,  and  neural  networks.  The  FUT  approach  is  a  subset  of  the  k-Nearest  Neighbor 
(kNN)  algorithm,  which  is  in  the  family  of  supervised  learning  algorithms. 

Our  use  of  the  kNN  algorithm  maps  a  single  HSI  remote  sensing  reflectance  vector,  Rrs(k),  onto  a 
database  of  estimated  Rrs(k).  This  database  is  created  by  providing  the  attributes  of  bathymetry. 
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spectral  bottom  reflectance,  and  spectral  lOPs  to  the  radiative  transfer  routines  of  Ecolight  (which  is  a 
high  speed  variant  of  Hydrolight,  Mobley,  1994).  We  select  the  classification  of  the  measured  Rrs 
vector  based  on  the  best  match  of  measured  Rrs(^)  to  estimated  Rrs(^).  The  LUT  algorithm  is  based 
on  a  single  best  fit  for  our  classification,  i.e.  k  =  1.  However,  more  recent  work  suggested  that  we 
could  achieve  a  better  classification  by  selecting  a  larger  number  for  k,  e.g.  k  =  50  (Bissett  et  al. 
2006a).  This  larger  number  for  k  provides  better  accuracy  and  precision,  as  well  as  provides  us  with 
the  ability  to  create  confidence  intervals  for  our  classifications  of  bathymetry. 

When  classifying  new  spectra,  the  distance  or  angle  between  each  measured  spectrum  and  estimated 
spectrum  in  the  database  is  calculated.  The  k  nearest  neighbors  to  that  spectra  (those  having  the 
smallest  distances  or  angles),  are  considered  sufficiently  qualified  to  predict  the  corresponding 
attributes  of  bathymetry,  bottom  type,  and  lOP  set.  We  have  used  the  following  metrics  for  the 
calculation  of  distance  (Euclidean,  Manhattan,  Chebyshev,  Canberra  and  Bray  Curtis)  and/or  angle 
(Angular  Separation  and  Correlation  Coefficient).  In  general,  our  applications  suggest  that  the 
Manhattan  distance  and  the  Correlation  Coefficient  angle  metrics  to  be  the  best  metrics  to  use  for  this 
algorithm.  Once  the  set  of  nearest  neighbors  are  determined,  the  attribute  (e.g.  bathymetry)  of  a  pixel 
may  be  determined  by  a  majority  vote  from  the  k  nearest  neighbor  vectors.  In  the  event  of  a  tie,  a 
prediction  is  made  randomly  from  amongst  the  majority  classes. 

The  computer  code  used  in  our  creation  of  the  estimated  Rrs(k)  database  and  the  spectral  matching  of 
the  measured  versus  estimated  Rrs(k)  is  functional  for  scientific  research;  however  it  not  well 
developed  for  transition  for  use  by  others  in  testing  and  evaluation  applications.  Our  first  task  of  this 
project  will  build  upon  our  past  research  efforts  to  provide  a  Message  Passing  Interface  (MPI) 
executable  version  of  our  kNN  workbench  for  the  inversion  of  hyperspectral  imagery.  This  code 
will  be  distributed  to  research  and  military  partners  for  testing  and  evaluation  purposes,  as  well  as  to 
complete  Task  2  and  3. 

Task  2 

The  spectrum  for  one  particular  depth,  bottom  type,  and  set  of  inherent  optical  properties  may  closely 
match  a  multitude  of  spectra  with  many  different  attributes  (Eigure  1).  The  selection  of  a  single 
nearest  neighbor  may  produces  noisy  predictions  because  of  the  noise  in  both  the  measured  and 
estimated  Rrs(k).  The  total  prediction  noise  is  a  function  of  the  noise  associated  with  the  measured 
Rrs(k),  which  contains  components  of  sensor  and  environmental  noise,  and  the  noise  associated  with 
the  estimation  of  Rrs(k)in  the  training  database.  This  noise  is  evident  in  the  “speckling”  that  may  be 
associated  with  these  inversion  techniques  (Eigure  2).  The  use  of  kNN  algorithms  work  to  reduce 
noise  of  the  prediction  by  increasing  the  probability  that  a  spectrum  presented  for  classification  will 
come  from  the  majority  class  of  proximally-Iocated  spectral  vectors,  rather  than  a  single  “lucky” 
spectrum.  In  this  case,  rather  than  selecting  the  single  database  spectrum  “O”  that  is  closest  to  the 
measured  spectrum  (represented  by  the  square  in  Eigure  1),  a  majority  vote  of  all  of  the  nearest 
neighbors  around  the  square  is  used  to  make  the  prediction  of  the  attribute  (e.g.  bathymetry)  at  that 
pixel  location.  Choosing  the  majority  class  creates  a  less  variable  space  from  which  to  make  a 
decision,  making  it  is  less  likely  to  produce  different  classifications  due  to  small  amounts  of  noise  in 
the  spectra. 

However,  as  the  size  of  the  training  database  increases  (through  the  increase  in  number  of  bathymetry 
depths,  bottom  types,  or  lOP  sets)  the  number  of  nearest  neighbors  also  increases  (Eigure  3).  This  in 
turn  causes  a  problem  with  “non-uniqueness”  in  the  selection  of  the  appropriate  class,  and  its 
component  attribute.  This,  in  turn,  causes  increasing  noise  in  the  map  of  the  estimated  attribute  (e.g. 
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bathymetry),  and  therefore  it  beeause  very  important  to  have  the  appropriate  “granularity”,  or  the 
proper  step  size  in  the  diserete  seleetion  of  attributes  that  are  used  in  the  creation  of  the  training 
database.  In  this  specific  case,  it  means  that  we  need  to  be  selective  in  the  selection  of  number  of 
depth  levels,  bottom  types,  and  lOP  sets  that  we  use  to  create  the  estimated  Rrs(>L)  database.  The 

second  Task  of  this  project  will  he  to  use  the  code  from  Task  1  to  rapidly  test  the  impacts  of 
granularity  of  attribute  selection  on  the  accuracy  and  precision  of  bathymetry  estimated  from 
our  kNN  code  and  the  HSI  data  from  Horseshoe  Reef  and  St.  Joseph  Bay,  FL'  (Bissett  et  al. 
2006b). 


Figure  1.  Xs  and  Os  are  the  classes  of  examples  belonging  to  the  training  database.  The  measured 
spectrum,  ,  is  closer  to  the  O  than  any  X.  In  kNN,  multiple  nearest  neighbors  are  used  to  vote  on 
the  appropriate  class.  Ifk  =  1,  class  O  is  chosen.  Ifk  >  1,  a  vote  amongst  all  the  classes  X  is 
chosen.  The  total  number  ofXis  dependent  on  the  value  of  k,  and  in  which  will  include  O  in  the 
retrieved  set.  The  estimate  of  the  attribute  may  then  be  calculated  from  any  number  of  statistical 
calculations  on  the  set  of  Xs,  e.g.  mean,  majority  vote,  etc. 


Task  3 

The  problem  of  sensor  and  environmental  noise  is  a  critical  issue  in  the  retrieval  of  accurate 
bathymetry  from  maps  of  HSI  data.  There  are  many  sources  of  environmental  noise  in  the  collection 
of  sensor  measured  radiance,  for  example  surface  waves  that  alter  the  reflection  surface  and  path 
length  to  the  bottom  reflectance  target.  These  surface  noise  effects  are  commingled  with  the 
atmospheric  and  illumination  correction  noise  to  produce  spatially  varying  Rrs(k)  over  areas  with 
identical  bathymetry,  bottom  types,  and  lOPs  (Figure  2).  In  order  to  reduce  the  impacts  of  this 
environmentally  generated  noise  component,  we  should  use  the  spatial  context  of  the  measured 
spectrum  during  the  selection  of  the  nearest  neighbor  classes,  and  subsequent  estimate  of  the  attribute 
of  interest. 


*  The  use  of  St.  Joseph  Bay,  FL  data  will  depend  on  acquiring  accurate  bathymetry  from  the  State  of  Florida.  If  we  do  not 
receive  bathymetry  of  sufficient  quality,  we  will  focus  on  the  Horseshoe  Reef  imagery. 
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Figure  2.  LUT  bathymetry  estimate  for  Horseshoe  Reef,  Bahamas.  The  black  dots  show  the 
locations  of  the  acoustic  pings.  The  color-coded  depths  are  for  the  unconstrained  LUT  retrieval  (k  = 
1)  applied  to  the  entire  image.  The  speckling  in  bathymetry  is  evident  in  the  throughout  the  image. 


Figure  3.  Xs  and  Os  are  the  classes  of  examples  belonging  to  the  training  database  and  are  the 
same  as  Figure  1.  The  A’s  are  addition  classes  resulting  from  increasing  the  depth  resolution,  as 
well  as  the  number  of  bottom  types  and  TOP  sets.  In  this  case  discussed  in  the  text,  these  A’s  may 
contain  attributes  that  are  unrepresentative  of  the  actual  values  and  represents  a  non-unique 
solution  to  this  inversion  problem.  The  selection  of  the  appropriate  depth  intervals  or  range  of 
bottom  types  and  lOPs  sets  is  important  to  reducing  this  non-uniqueness.  The  term  granularity  is 
used  to  describe  the  separation  between  the  discrete  levels  in  the  attributes. 
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Heretofore  we  have  done  point-  or  pixel-speeifie  elassification  of  HSI  data.  That  is,  eaeh  pixel  is 
classified  (for  depth,  bottom  type,  and  water  lOPs)  independently  of  its  neighbors,  and  only  the 
spectral  character  of  the  pixel  is  used  in  its  classification.  Task  3  will  be  to  evaluate  spatial  context- 
sensitive  classification,  which  means  that  we  will  incorporate  information  about  the  spatial 
neighborhood  (the  spatial  context)  of  a  pixel  to  assist  with  its  classification.  Context-sensitive 
classification  is  often  used  in  traditional  terrestrial  thematic  mapping  (e.g.,  Richards  and  Jia,  2006, 

§8.8)  and  some  of  those  techniques  may  be  beneficial  for  our  oceanic  problem. 

This  Task  will  evaluate  two  types  of  context-filtering  -  (1)  pre-filtering  of  the  Rrs(k)  spectra 
before  classification,  and  (2)  context-filtering  of  the  retrieved  attrihutes  after  classification.  The 

first  type  of  context-filtering  seeks  to  reduce  the  noise  in  Rrs(k)  spectra  by  replacing  the  spectrum 
value  at  each  wavelength  with  the  median  value  of  the  spectra  in  a  spatial  area  surrounding  the  pixel  of 
interest,  say  a  3  x  3  grid  of  pixels  centered  on  the  one  of  interest.  This  spatial  filter  is  applied 
wavelength  by  wavelength.  At  wavelengths  where  Rrs(k)  is  mostly  signal,  the  final  spectrum  will  not 
change  by  much.  At  wavelengths  where  Rrs(k)  is  noisy,  the  noise  in  the  surrounding  pixels  will  tend 
to  average  out  and  the  final  spectrum  values  over  the  entire  image  area  will  be  less  noisy  that  the 
original. 

The  second  type  of  context-filtering  involves  post-processing  the  retrievals  themselves,  rather  than  the 
original  image  spectra.  In  the  case  of  real  numbered  attributes,  such  as  bathymetry,  we  can  apply  a 
median  filter  to  the  retrieved  depth.  For  bottom  type  and  lOP  set,  the  way  forward  is  less  clear.  Each 
of  these  attributes  is  assigned  a  type  with  a  specific  vector  (or  set  of  vectors  in  the  case  of  lOPs)  of 
spectral  values.  How  we  filter  “Dark  Sediment”  with  “Sparse  Vegetation”  or  “Highly  absorbing  and 
scattering  waters  #1”  with  “Case  1,  chlorophyll  a  =  0.5  mg  m'  ”  will  be  a  challenge.  It  may  require 
some  iterative  solution  that  context-filters  bathymetry  first,  and  solves  the  kNN  again  using  a 
constrained  bathymetry  solution  approach.  It  may  also  be  highly  dependent  on  the  granularity  study  in 
Task  2.  These  are  the  issues  that  we  will  address  in  this  Task. 

WORK  COMPLETED 

Task  (1)  has  been  completed  and  the  serial  and  MPI  versions  of  our  optimized  machine  learning  code 
is  available  for  v  0.1.0  release.  The  code  will  be  distributed  in  a  generic  Red  Hat  Package  Manager 
(RPM;  http://en.wikipedia.org/wiki/RPM_Package_Manager)  format  for  installation  on  Red  Hat, 
Fedora,  and  CentOS  version  of  Linux. 

IMPACT/APPLICATIONS 

This  effort  will  deliver  an  application  for  testing  and  evaluation  of  our  machine  learning  approaches  to 
bathymetry  estimation  in  Very  Shallow  Waters  (VSW).  While  it  is  being  demonstrated  on 
hyperspectral  imagery,  the  techniques  and  computer  code  may  be  used  with  any  set  of  spectral 
reflectance  data.  As  such  the  deliverables  from  this  effort  will  allow  other  to  create  maps  of  depths, 
bottom  types,  and  water  clarity  from  a  variety  of  airborne  and  space-based  spectral  sensors  planned  for 
operational  deployment. 

RELATED  PROJECTS 

This  work  is  being  conducted  in  conjunction  with  Dr.  Curtis  D.  Mobley  at  Sequoia  Scientific,  Inc., 
who  is  funded  under  this  effort  for  the  collaboration.  These  techniques  developed  here  are  now  being 
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applied  to  imagery  of  Australian  coastal  waters  in  a  comparison  of  several  different  hyperspectral 
remote  sensing  algorithms  for  a  variety  of  environments.  That  comparison  study  is  being  led  by  A. 
Dekker  of  CSIRO. 
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